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Abstract This work initiates research into the problem of determining an optimal 
investment strategy for investors with different attitudes towards the 
trade-offs of risk and profit. The probability distribution of the return 
values of the stocks that are considered by the investor are assumed to 
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be known, while the joint distribution is unknown. The problem is to 
find the best investment strategy in order to minimize the probability 
of losing a certain percentage of the invested capital based on different 
attitudes of the investors towards future outcomes of the stock market. 
For portfolios made up of two stocks, this work shows how to ex- 
actly and quickly solve the problem of finding an optimal portfolio for 
aggressive or risk-averse investors, using an algorithm based on a fast 
greedy solution to a maximum flow problem. However, an investor look- 
ing for an average-case guarantee (so is neither aggressive or risk-averse) 
must deal with a more difficult problem. In particular, it is (jP-complete 
to compute the distribution function associated with the average-case 
bound. On the positive side, approximate answers can be computed by 
using random sampling techniques similar to those for high-dimensional 
volume estimation. When k > 2 stocks are considered, it is proved that 
a simple solution based on the same flow concepts as the 2-stock algo- 
rithm would imply that P = NP, so is highly unlikely. This work gives 
approximation algorithms for this case as well as exact algorithms for 
some important special cases. 

Keywords: risk management, portfolio optimization, computational hardness, ap- 
proximation algorithms, greedy strategies, network flows, volume esti- 
mation, random walks. 



1. Introduction 

This work initiates the study of the risk profile problem for stock 
portfolio optimization. The problem has several variants depending on 
a given investor's preference toward the trade-off between risk and return 
[Sharpe et al., 1995]. 

In the problem, the investor has a capital, which is normalized to 
one dollar. She considers k different stocks S\,...,Sk and wishes to 
invest some Xi dollars in each stock Si for a certain period of time, 



\h; 



= 1 



where Y^i=i x% = 1 and x% > for all i. The vector x = {x, 
(xi,X2, • • • ,Xfc) is called a portfolio. Let Vk be the set of all portfolios 
for k stocks. The return of x is the ratio, expressed as a percentage, of 
the worth of this portfolio at the end of the investment period to the 
initial investment of one dollar. The return of stock Sj is the ratio of its 
price at the end of the investment period to its initial price, which is the 
same as the return of the portfolio (xi}^ =1 with Xj = 1 and all the other 
Xi = 0. 

In mathematical finance, stock prices are often assumed to follow 
geometric Brownian motions or its variants (e.g., see [Duffie, 1996, El- 
liott and Kopp, 1999, Fouque et al., 2000, Hull, 2000, Karatzas, 1997, 
Karatzas and Shreve, 1998, Musiela and Rutkowski, 1997]). To comple- 



ment this conventional approach with computer science methodologies 
[Cormen et al., 1990], we assume that stock prices can move arbitrarily. 

Let fi be a positive real number. Let ttt-i and 777-2 be integers with ttti < 
7772, and let 777 = 7772 — ttii + 1. Let A = {£fi \ £ = mi, . . . , 7772}. Each 
stock Si is associated with a discrete probability distribution Si over A, 
where Si(f3) is the probability that the stock's return is (3%. For the 
sake of technical convenience, we allow ttt-i and 7772 to be negative. The 
probability distributions Si , . . . , <S/% are part of the input in our problem 
and are obtainable, e.g., by observing historical market data. We assume 
that non-zero values satisfy Si(f3) > l/n c for some constant c, and 
when representation is important we assume that these values can be 
represented as fixed-point numbers with O(logre) bits. The parameters 
/i, 777i, and 7772 control the precision and range of such observations. 
For instance, for // = 1, ttti = 0, and 7772 = 200, the set of possible 
returns are 0%, 1%, . . . , 200%. The joint distribution of the k probability 
distributions Si is usually unavailable for a variety of practical reasons. 
In particular, a joint distribution consists of 77 entries and thus would 
require observing an exponential number of data points in k. 

The investor's goal is to find a portfolio x, which is optimal according 
to her risk preference in six basic cases as follows. For a risk-averse 
investor, minimizing loss is more important than maximizing win, while 
an aggressive investor has the opposite priority. Each of these two in- 
vestor types can be further classified into three subtypes, namely, best- 
case, worst-case, and average-case, referring to whether the probability 
of loss or win is estimated in the best, worst, or average case over the 
feasible joint distributions. More precisely, for each of these six types, 
the investor first chooses a target return a and then looks for such a 
portfolio x that optimizes one of the following six probabilities: 

■ TZAb(a,x) (respectively, lZA w {a,x) or 1ZA a (a,x)) is the smallest 
(respectively, largest or average) probability that the return of x 
is at most a% over all joint distributions for Si, . . . ,<S&. 

■ AQb{a, x) (respectively, AQ w (a, x) or AQ & {a, x)) is the largest (re- 
spectively, smallest or average) probability that the return of x is 
at least a% over all joint distributions for Si, . . . ,S^- 

If the investor is best-case (respectively, worst-case or average-case) 
risk-averse, she would choose x to minimize 7ZAb(a,x) (respectively, 
lZA w (a, x) or lZA&(a,x)). In contrast, if the investor is best-case (re- 
spectively, worst-case or average-case) aggressive, she would choose x to 
maximize AGb{ot,x) (respectively, AQ w {a, x) or AGa,(oc,x)). 

While the risk profile problem originates from a very applied field, 
the corresponding mathematical model has a substantial combinatorial 
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structure. In the cases where the investor is highly risk-averse or highly 
aggressive, we can model the problem as a network flow problem. Quite 
surprisingly, in the two-stock case, this flow problem is solvable by a sim- 
ple greedy algorithm in 0(m) time. In contrast, for the three-stock case, 
the applicability of a greedy flow-based algorithm would imply P = NP. 
If the number k of stocks is part of the input, we give an exact algorithm 
based on linear programming which takes time polynomial in the num- 
ber of entries of a corresponding contingency table but exponential in 
the input size. To supplement this algorithm, we also give a polynomial- 
time approximation algorithm based on linear programming. We further 
present an exact polynomial-time algorithm in the practical case where 
the capital can only be broken up into a fixed number of units (e.g., 
cents) . 

It remains open whether this problem is TVP-complete if the number 
of stocks is part of the input. We strongly suspect that this is indeed 
the case. 

In the case of an average-case investor we show (JP-hardness of the 
problem of computing the distribution function over various probability 
bounds, a natural first-step in solving the average-case investor problem. 
This hardness result holds even in two dimensions, and we describe an 
approximation algorithm for this case. This algorithm uses a random 
walk approach to sample from the feasible joint distributions, and is 
closely related to volume computation and sampling from log-concave 
distributions. 

Section 2 defines some notation. Section 3 discusses the case where 
there are only two stocks under consideration. Section 4 discusses the 
case of general k. Due to page limitations, all figures are placed in the 
appendix (these figures are helpful in understanding the material, but 
are not strictly necessary). 

2. Notation 

Let S £ A k denote a vector (Si, ... ,5^}, where Si € A. Let 

denote a /c-dimensional matrix indexed by A k . Let M^ denote the set of 
/c-dimensional matrices for all possible joint distributions of «Si, . . . ,S}~; 
i.e., M.k consists of all matrices 

where (1) Mr is the probability that the return of stock Si is Si% for 
i = 1, . . . , k, and (2) thus for all S £ A k , Mr > and for all (3 6 A and 



4 

j = l,...,k, 



Sj(P)= E M s- 



For instance, Adk contains the matrix M denned by 

k 

i=l 

Also, in the two-stock case, each M E M2 is just a two-dimensional 
m x m matrix, where for all 61,62 £ A, the entries of M in column 6\ 
sum up to Si(6\) and those in row 62 sum up to 52(52). 
Given a portfolio x G Vk and a target return a, let 

L(a,x) 



(a, a; 



U(a,x) 



: J£eA fc |5>^<a|, 

= UGA fe |ExA>a|, 

l/**(a,x) = j^e A fc |J^xA >a|, 

which are the sets of the indices of all entries in the matrices in Mk such 
that the return of x is at most, less than, at least, and more than a%, 
respectively. We further define the following functions on M G Mk'- 

L a ,g{M) = Y. M *> 

5eL(a,x) 

l*;am) = e M g , 

SeL**(a,x) 

u*M M ) = E M «?> 

%&J(a,x) 

Kam) = E m p 

SeU**(a,x) 

which are the probabilities in the joint distribution M that the return 
of x is at most, less than, at least, and more than a%, respectively. 
Formally, if u^ fe (M) is a uniform density over Mk, 

KA b {a,x) = min L a j(M); (1.1) 

MeMk 
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7ZA w (a,x) = max L a g(M); (1-2) 

MeM k 

TZA a (a,x) = f L a:3 (M)u Mk {M)dM; (1.3) 

JM k 

AG h (a,x) = max U a , 2 {M)- (1.4) 

MeMk 

AQ w (a,x) = min t7 Q)5r (M); (1.5) 

MeM t 



*4£ a (a,£) = / U a ^{M)u Mk (M)dM. (1.6) 

For example, in the two-stock case, L(a, (xi, x 2 }) is the set of all indices 
in a two-dimensional table M in .M 2 on or below the line x\5i+X252 = ®, 
and HA- w (a,{xi,X2)) maximizes the sum of the entries in this region 
under the condition that M has the given column and row sums of 
«Si(mi), . . . ,«Si(m 2 ),«S 2 (mi), . . . ,<S 2 (m 2 ). 

For technical convenience, we also define the following terms: 



KAl*(a,x) = 


MeMk 


(1.7) 


KA* w *(a,x) = 


= ^Y^S AM) ' 


(1.8) 


TZAl*(a,x) = 


-- f T a * s (M)dM; 

JM k 


(1.9) 


AGl*{a,x) = 


M€M k 


(1.10) 


AG* w *(a,x) = 


~~ A^^l 


(1.11) 


AG* a *(a,x) = 


~- 1 U™ s {M)dM. 

JM k 


(1.12) 



Lemma 1 The following statements hold. 



min TZAb(ce,x) 
SeV k 

min TZA w (a, x) 

xev k 

min TZAJa, x) 

xeV k 

max AQb(a, x) 

xev k 

max. AG W ( a, x) 

xev k 

max AGJa, x) 
$£P k 



— max AQt* (a, x) 

xeP k 


(1.13) 


— max AG** (a,x) 

x&V k ' 


(1.14) 


— max AG** (a, x) 


(1.15) 


— min lZAZ*(a, x) 

xer k ' 


(1.16) 


— min TZA**(a,x) 

xev k 


(1.17) 


— min 1ZA** (a,x) 
xep k 


(1.18) 



Proof: Straightforward. I 

In light of Lemma 1, to solve the risk profile problem, it suffices to 
show how to compute 

min^ e p fc UAb(a, x) , min^ GPfc TZA w (a, x) , minj 6Pfc TZA a (a, x) , 
mhLg e -p k TZAl* (a, x) , min^ ePfc 11A£ (a, x), min^^ 1ZA** (a, x) . 

The techniques for computing the latter three expressions are essen- 
tially the same as those for computing the former three. Furthermore, 
the techniques for computing the first expression are almost identical 
to those for computing the second. For these reasons, the remainder 
of our discussion focuses on how to compute mm^ e p k TZA w (a,x) and 
mm geVk KA a (a,x). 

3. The Two-Stock Case 

This section assumes that k = 2, i.e., there are only two stocks under 
consideration. In the case of two stocks, we can visualize the problems 
under consideration as in Figure 1.1. The discrete and finite set of 
possible return pairs for the two stocks in the portfolio are shown as 
the dots in this picture - each pair has a probability (from the joint 
distribution) associated with it, with the given restrictions on column 
and row sums. A given portfolio and target return a defines a half-space 
on the set of return pairs, with the shaded area in Figure 1.1 giving 
the area in which the total return is < a. The problem of computing 
7ZA w (a, x) then is the problem of determining which feasible assignment 
of joint probabilities places the highest total probability in the shaded 
region. 

3.1. A Worst- Case or Best- Case Investor 

Given a target return a, this section focuses on how to compute an 
optimal portfolio for a worst-case risk-averse investor. The cases of a 
best-case risk-averse investor, a worst-case aggressive investor, and a 
best-case aggressive investor can be solved similarly. 

We first present a basic algorithm to compute 7ZA w (a,x) by com- 
puting a worst-case joint distribution matrix M for <Si and S2. For 
convenience, we index the entries of M with {(i,j) | i,j = mi, . . . , 1712}, 
where row i (respectively, column i) corresponds to return i[i of Si (re- 
spectively, jn of S2). We model the problem of computing Mas a 
network flow problem on the graph G defined below: 

■ G has 2(m + 1) vertices, namely, a source s, a sink t, and 
v mi ,... ,v m2 , w mi ,..., w„ l2 , where Vi (respectively, Wi) corre- 
sponds to return ifi of stock Si (respectively, stock S2). 
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Figure 1.1. Visualization of two stock case 

■ For all i,j = mi, . . . , m?, G has (1) edge (vi,Wj), which has capac- 
ity c(vi,Wj) = 1 if xiifx + X2JH < a or otherwise; (2) the edge 
(s,Vi) with capacity c(s,Vi) = Si(i/j,); and (3) the edge (wj,t) with 
capacity c(wj,t) = ^(j'/u). 

Geometrically, we wish to push as much probability as possible into 
the region of M defined by x\i + x<ij < — . In other words, the value of a 
maximum s — t flow of G equals TZA w (a, x). Thus, it is tempting to use 
a maximum flow algorithm to solve this maximum flow problem. The 
fastest known algorithm for this problem is due to Goldberg and Rao 
[Goldberg and Rao, 1998] and runs in 0*(m s) time 1 for our application 
(note that m in this bound is as defined in this work, not as the number 
of edges which is typical in general flow discussion) . Instead of using this 
algorithm, we exploit some structural properties of G to solve the flow 
problem using a simple greedy algorithm in O(m) arithmetic operations. 
Note that since G may have 0(m 2 ) edges with positive capacity, we 
cannot afford to construct the whole G explicitly. The idea of our 0(m)- 
time algorithm can be described as follows. 

Starting with v m2 , we try to push a flow of c(s,v m2 ) through 
G. Assume c(v m2 , w mi ) = 1 for simplicity. We consider the path 
formed by edges (s,v m2 ), (v m2 ,w mi ), (w mi ,t) first. We can push flow 



1 We use O* (/(«)) for the "soft-O" notation, which ignores polylogarithmic factors. In bounds 
for the approximation algorithms, this notation also ignores factors that depend only on the 
approximation bound e. 



min(c(s, v m2 ),c(w mi ,t)) through this path, saturating either (s,v m2 ) 
or (w mi ,t). If we saturated (s,v m2 ) then we next consider the path 
(s,v m2 -i), (% 2 _i,w mi ), (w mi ,t) for pushing additional flow; how- 
ever, if we had saturated (w mi , t) we will next consider the path 
(s,v m2 ),(v m2 ,w mi+ i),(w mi+ i,t). We continue in this fashion until we 
can push no more flow. The only complication is that if at some point we 
are considering the path (s,Vi), (vi,Wj), (wj,t), and c(vi,Wj) = 0, then 
obviously we can't saturate either (s,Vi) or (vj,t), and we simply de- 
crease i to next consider the path (s, Wj-i), (vj-i, Wj), (wj,t). The details 
of this 0{m) time algorithm are given in Figure 1.2. 

procedure Greedy-Flow 

i <— 777-2 

cv <— c(s,Vi) 

j <- 777,1 

eu; <— c(777j , t) 

loop 

if c(vi,Wj) = 1 and ctu < ct; then 

F ^ F + cw 
cv <— cf — ct/j 

J <- J + 1 

if j > 7772 then return F 
cw <— c(iuj, i) 
else 

if c(tjj, Wj) = 1 then 

F <- F + cv 

cw <— cti; — cf 
end if 

7^7 — 1 

if 7 < 777i then return F 

CV <— c(s, Uj) 

end if 
end loop 

i.,2. The procedure Greedy-Flow 



Theorem 2 Given S\,S2, a valid portfolio vector x, and a as input, 
Greedy-Flow computes the value of a maximum flow of G in 0(m) arith- 
metic operations. 
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Proof: As a first step we prove that the algorithm computes the 
maximal flow. Let £ be the minimal index such that (we,t) is not sat- 
urated after termination of the algorithm and k be the minimal index 
such that c(vk,wg) = 0. We define a partition V\ U V 2 of the nodes by 

Vi = {s,v k , . . . ,v m2 ,w mi , . . . ,we-i}, V 2 = V x . 

It is trivial from the definition of j that the edges e = (wi , t) , i = 
{mi, ...,£ — 1} are saturated. 

Since x±,x 2 > 0, and k is the minimal value such that c(vk,Wg) = 0, 
we have c(vi, we) = 1 for i = mi, . . . , k — 1. Since (wi, t) is not saturated, 
all edges (s, Vi),i E {mi, . . . , k — 1} must be saturated. 

From the definition of k and the non-negativity of the portfolio vec- 
tor it is easy to see that edges e = (vi,Wj) for i € {k, . . . ,m2}, 
j € {^,...,771,2} and positive capacity cannot exist. Thus, every edge 
e = (x, y) with x G Vi and y € V2 is saturated. The Max-Flow-Min-Cut 
Theorem then implies that the algorithm indeed computes a maximal 
flow. 

Observing the fact that in each loop iteration either index i is decre- 
mented or index j is incremented, and that there are only m different 
values that either i or j can take on before the algorithm terminates, 
there are at most 2m — 1 loop iterations, and the linear running time 
bound follows. I 

To compute inf {TZA w (a, x) \ J2 x i = 1} we have to compute lZA w (a, x) 
for all possible portfolios (xi,X2). However, each feasible portfolio cor- 
responds to a half-space (as in Figure 1.1) defined by a line that goes 
through the point (a, a) (x±a + x 2 a = a, since X\ + x 2 = 1), so we only 
need to consider the 0(m 2 ) distinct subsets of return pairs that can be 
defined by a line going through (a, a). We can identify each such port- 
folio with a different (non-positive) slope s\, . . . , s m 2, which we assume 
to be sorted in descending order. By using a suitable data structure it 
is possible to compute the best portfolio much faster than the obvious 
0(m 3 ) algorithm that starts the greedy algorithm for each slope. 

Theorem 3 Given S±,S 2 , and a, we can compute inO(m?\ogm) arith- 
metic operations a portfolio (xi, x 2 ) for a worst-case risk-averse investor 
which minimizes equation (1.2). 

Proof: Starting with the first slope s± we build up a binary tree. 
Each is labeled with a pair of two real entries (ei, e 2 ). The leaves of the 
tree correspond to the rows and the columns in the following way. 

Starting from column m 2 we add leaves from left to right. We add 
leaves with labels (0,S 2 (mi/i)), (0,S 2 ((mi + l)/i)), . • ., (0,S 2 (j m fi)), un- 
til we reach a row index j m such that xim 2 fi + x 2 (j m + l)/i > a, 
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i.e., this index is the last under the crucial line. To be precise we 
let j m = |_~5n7^J' no * e t na * it may be the case that j m < mi, 
so this sequence of leaves may be empty. Then we add the leaf 
(— <Si(m2ju),0). Next, we consider column mi — 1 and add leaves 
(0,5 2 ((im + l)/u),... ,(0,5 2 ((i m -i)/i)), until we reach an index j m _i, 
such that x\(mi — 1)/J + X2(j m -i + 1)/" > «■ Then we add the leaf 
(— <Si( (777.2 — 1)//), 0) and proceed similarly with column 7772 — 2. Note 
that the order of adding leaves is crucial to this data structure and the 
correctness of the algorithm is based on that. Starting from left to right 
we group the leaves in pairs of 2 and build a parent node for each pair 
according to the following rule 

parent[(ei,e 2 ),(/i,/ 2 )] = (ei + min{e 2 + /i,0},max{e 2 + /1, 0} + / 2 ). 

We build O (log 777) layers iteratively, until we reach a single root node 
(r 1,7-2). It is easy to see that this tree based algorithm imitates the 
greedy algorithm described before and that 1 + r\ = 1 — r2 is exactly the 
flow value. Building this tree structure takes constant time per tree node, 
and since there are 0(m) nodes we have a total time of 0(m), which is 
no better than the time bound of the greedy algorithm. The advantage 
is that we can dynamically update this data structure efficiently. 

We will first sort all of the m 2 possible return pairs by their slope 
with the point (a, a), so that as the slope determined by our portfolio 
increases we can quickly (in constant time per pair) determine which 
pairs are added and which are removed from our half-space of interest. 
This takes 0(m 2 log 777) time. To update our data structure for each 
point insertion/removal, all that is required is swapping the position of 
two neighboring leaves. With obvious techniques, the positions of these 
two leaves can be found in O(l) time, and we can update the tree by 
looking at the path from the two leaves to the root and update each 
node on that path. Each update step requires O(l) operations and the 
length of the path is bounded by 0(log777). Since there are at most 
777 2 point additions and removals, each taking 0(log777) time, it takes at 
most 0(777 2 log777) time to consider all possible portfolios. I 

3.2. The Average-Case Investor 

For the average-case investor (JZA a or AG a ), we are not interested 
in the extremes of the joint distributions, but rather the distribution of 
the feasible tables. In this section we consider Q = L a ^(M) a random 
variable where M is drawn from a uniform distribution over the feasible 
tables Mk- The definition of lZA a (a,x), from (1.3), is then E[Q}. We 
will see that computing the distribution function of Q is a computation- 
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ally difficult problem to solve exactly, but can be approximated within 
a reasonable (polynomial) amount of time. 

Theorem 4 Let 7 £ [0, 1] be an n-bit rational. It is $P-hard to compute 
the fraction of feasible tables M e M.2 with 

L a ,g(M)= £ M & < 1 

6eL(a,x) 

(the integration of the corresponding indicator function, or the distribu- 
tion function for Q). 

Proof: Given positive integers ai, ... ,a n ,b, it is shown in [Dyer and 
Frieze, 1991] that computing the n-dimensional volume of the polyhe- 
dron P 

n 

Y^ ajVj <b < yj < 1 (j = 1, . . . , n) 
3=1 

is j}P-hard. Let d = J21=i a j an d consider the polyhedron 

n+l 

Y / a J y j = d 0< % -<l (j = l,...,n + l), (1.19) 

3=1 

where a n +i = d. Note that for any valid assignment of values to 
2/1, V2, ■ ■ ■ , Vn we have < YTj=\ a jVj < d> so there is a y„ + i € [0, 1] that 
will satisfy (1.19). Now let a\ = a,i/(2d) and define a 2 x (n + 1) con- 
tingency table by t\j = a'ji/j,t2j = a'Al — i/j), with row sums (1/2, 1/2) 
and column sums (a[, . . . , a^ + i). 

To completely define our stock problem, we must also give values for 
/x, a, the portfolio x = (xi,X2), and the threshold 7, which we do as 
follows: 

1 n 2n b 

n + l n + l n + l 2d 

It is straightforward to verify from these values that the return pairs 
in the critical region (the shaded region in Figure 1.1) are exactly the 
entries t\j for j = 1, . . . , n. Therefore, the tables that satisfy our criteria, 
that L a ${M) < 7, are precisely those with 



3=1 3=1 3=1 
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Therefore the feasible tables that meet our criteria are exactly those that 
correspond to points in polyhedron P, and so the fraction of tables that 
meet the criteria is exactly the volume of P. I 

Following the notation of Dyer, Kannan and Mount [Dyer et al., 1997], 
who describe a sampling procedure for contingency tables with integer 
entries and large row and column sums (> f2(ra 3 )), we define 



V{r,c) = lxeYL mxm \Y J X ij =r i ioTi = l,... 1 m 
and Y^ Xij = Cj for j = 1, 



,m 



and 

P(r,c) = V(r,c) fl {x\xij > for i = 1, . . . ,m, j = 1, . . . , m} 

as the contingency polytope. Thus, V(r,c) is the set of matrices with 
row and column sums specified by r and c respectively. In our case 
Ti = Si(i[i), Ci = ^(i/i) and P(r,c) is the set of joint distributions Aik- 
Let U be the lattice 

{x € Z mxm \ Y, x^ = for i = 1, . . . , m, ^ x^- = for j = 1, . . . , m}. 
i * 

For 1 < i < m — 1 and 1 < j < m — 1, let 6(ij) be the vector in R, mxm 
given by b(ij) iyj = l,b(ij) i+1:j = -l,b(ij) ijj+1 = -l,b(ij) i+1J+1 = 1 
and b{ij)k,t = for all other indices k,£. Any vector x in V(0, 0) can be 
expressed as linear combination of the b(ij)'s as follows 

m— 1 m— 1 I k £ \ 

fc=i <=i \i=ij=i y 

It is easy to see that the b(ij) are all linearly independent and the the 
dimension of V(r, c) and P(r, c) for positive row and column sum vectors 
r and c is (m — l) 2 [Dyer et al., 1997]. We will apply the sampling 
algorithm pioneered by Dyer, Frieze and Kannan [Dyer et al., 1991] and 
later refined in a sequence of papers (see [Kannan, 1994] for an overview) 
to sample uniformly at random in P(r, c). 

We sample in the space V(r, c). As mentioned in the introduction, we 
know a starting point zq in P(r, c) (multiplication of rows and column 
sums). It is easy to see that a ball of radius b 2 is inside P(r, c), if every 
component of r and c is at least b. Since in our case r and c sum up 
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to one, P(r,c) C -8(0, 1). The following theorem is a corollary of the 
analysis of the fastest sampling algorithm in convex bodies known so far 
by Kannan, Lovasz and Simonovits [Kannan et al., 1997]. 

Theorem 5 We can generate a point in P(r,s), which is almost uni- 
form in the sense that its distribution is at most e away from the uni- 
form in total variation distance. The algorithm uses 0*(^pr) membership 
queries of P(r,s) (each requires 0{m 2 ) arithmetic operations). 

procedure Estimate(x) 
S ^0 

for £= 1,...,N do 

Q <— result from sample procedure started at x 

S <— S + L a ^(Q) 
end for 
S^S/N 
return S 

Figure 1.3. The approximation algorithm 



Theorem 6 Procedure Estimate (in Figure 1.3) computes a number S 
inO* (prsr) arithmetic operations, which approximates TZA a ((X,x) (i.e., 
lZA a (a, x) — e < S < lZA a (a, x) + e) with probability 1 — 5. 

J>roof: Let S k = ££? = i AmKCO- Thus, E(S k ) = 

J L a g(M)w(M)dM , where w is the density produced by the random 
walk. Since < L a ^(M) < 1 for all M £ M2, it is easy to see that 
c 2 (£i) < 1 and so a 2 (S k ) < \- By Chebychev's inequality, 

P(| Si - E(Si) |> £ /2)<^<-l. 

Since the samples are not entirely uniform, we must consider the error 
introduced by the approximately uniform sampling distribution as well. 
Let n_A/( fe (M) denote a uniform density over the set M. k , and then ap- 
proximating a uniform distribution within bound e/4, Theorem 5 implies 

\E(S k )-KA a (a,x)\ 

L a j(M)w(M)dM - I' L a j(M)u Mk (M)dM 
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< / (w(M) - u Mk {M)) dM 

Jw>u Mk 



+ (u Mk (M) - w(M)) dM 

Jw<u Mk 

< e/2. 
Setting k = -^ the theorem follows. I 

4. The fc-Stock Case 

In this chapter we consider the general case of more than two stocks. 
Since the problem of estimating the probability distribution for the 
average-case investor is already fl-P complete in the two stock case, we 
do not consider it any more and concentrate on a worst-case investor. 
We start with a complexity result for three stocks, which implies that a 
greedy or flow based portfolio is quite unlikely to exist. 

Theorem 7 The existence of a greedy or flow based portfolio for the 
problem with 3 or more stocks implies P = NP. 

Proof: We prove this result by reduction from NUMERICAL- 
3-DIM-MATCHING. Consider an instance of NUMERIC AL-3-DIM- 
MATCHING, i.e., disjoint sets Ai, A2, A3, each containing m elements, 
a size s(a) g Z + for each element a £ X\ U X2 U A3 and bound BeZ, 
We would like to know if X\ U X2 U A3 can be partitioned into m disjoint 
sets such that each of these sets contains exactly one element from each 
of Ai, A2, and A3, and the sum of the elements is exactly B (we can 
change this requirement to < B without difficulty). This problem is 
NP-complete in the strong sense, so we restrict the sizes to be bounded 
by a polynomial, s{a) < n c for some constant c. 

We construct an instance of the problem of computing 
7ZA w (a, (1/3, 1/3, 1/3}) by making a contingency table in which 
<Sk(i) = Cfc,*/ 771 ) where c^ j is the number of items in set X}. with value i. 
The existence of a greedy or flow based algorithm implies the existence 
of a solution in which all entries in the solution table are multiples of 
1/m, and such a solution exists with L a %(M) = 1 if and only if there 
is a valid partition of X± U A2 U A3. If such a partition exists, we can 
find it by simply taking all of the triples "selected" (with multiplicity 
determined by the integer multiple of 1/m), and use elements from Ai, 
A2, and A3 as determined by the three coordinates of each selected 
point. I 

While this proof shows that it is unlikely that a fast and simple greedy 
or flow-based algorithm exists, as it does for 2 stocks, we can indeed solve 
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the problem for a fixed number of stocks in polynomial time using a more 
time-consuming procedure based on linear programming. This is stated 
in a general setting in the following theorem. 

Theorem 8 If the number of stocks k is part of the input, the problem 
of determining the best portfolio for a worst-case investor can be solved 
in time polynomial in the number of entries of the contingency table (but 
exponential in k). 

Proof: The problem can be modeled as linear program with a 
number of variables, that corresponds to the number of entries of the 
contingency table, and km inequalities. I 

4.1. An Approximation Algorithm 

In this section we describe an approximation algorithm, that solves 
the problem of determining the worst case probability for a given port- 
folio within a given error e G R + in polynomial time. Additionally, we 
describe an important, non-trivial special case, where the problem can 
be solved exactly in polynomial time. 

Theorem 9 Suppose that a portfolio (xi)^ =1 and a target return a are 
given. The worst-case probability can be approximated (i.e., we compute 
a value W with 7ZA w (a,x) — e < W < TZA w (a,x) + e) in time polynomial 
in k andn. The number of steps is dominated by solving a linear program 
in 0(km 2 /e 2 ) variables and 0(km/e) constraints. 

Proof: We consider the first pair of stocks S\ and £2 as i n the 
two dimensional case and define a new portfolio as x\ = - x } and 
x~2 = x , • We divide the two dimensional plane in £ = -mlogk 
regions by £ parallel lines x~\X + x~2V = const of constant distance. Thus, 
we divide the entries of the joint distribution matrix into £ different sets 
(see Figure 1.4). 

Each entry in the matrix corresponds to a variable and the variables 
satisfy the row sum and column sum condition of the joint distribution. 
Next, we sum up the entries in the £ different sets and assign the sums 
to £ new variables. By combining these sum variables from two differ- 
ent pairs of stocks, we get a new table with new row and column sum 
conditions, resulting again in £ new sum variables. 

Repeating combinations in this manner, we stop after log k iterations 
and the creation of 0(km 2 log k/e 2 ) variables and 0(km\ogk/e) con- 
straints, leaving just one table with 2 border distributions (expressed as 
variables). Assuming, that the variables of the border distributions cor- 
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Figure 1.4- Striping idea used in worst-case approximation construction 



, S k /2 and Sfc/2+i, 



,Sk 



respond to the distribution of the stocks S±, 
we do the following. 

We define a portfolio x\ = .^ — — and £2 = fc/2 1± for our 

Z_i Xi / , Xi 

last table and consider the line x\x + x~iy = ct, dividing our last table in 

two sets. The variables below that line are summed up and we solve a 

linear program by maximizing this sum subject to the constraints created 

before. Since we reduced the number of entries in each table from Vtirn?) 

to only £, that are considered in the next table, we lost some precision 

during the combination. But, after the first pairing in the lowest level 

of the binary tree, each sum variable represents a loss probability of the 

combination of the two stocks within an error of ] _£ t%- Furthermore, 

it is easy to see that during the repeated combination of the stocks the 

error accumulates linearly in each iteration. Thus, the theorem follows. 



Theorem 10 Suppose that a portfolio (xi)^ =1 and a target return prob- 
ability p is given. Under the assumption, that the dollar, that has to be 
invested, can only be broken into a fixed number c of equal units (cents), 
the worst-case probability can be computed exactly in time polynomial in 
k and m. 



Proof: The proof is based on a similar construction as the approxi- 
mation algorithm and is omitted for brevity. I 
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